巴西专利BR112019019191A2 polar sphere projections for efficient 360-degree video compression

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
systems and methods for processing 360 degree video data are provided. in several implementations, a spherical representation of a 360-degree video frame can be segmented into a top region, a bottom region and a middle region. the middle region can be mapped into one or more rectangular areas of an output video frame. the top region can be mapped to a first rectangular area of the output video frame using a mapping that converts a square to a circle, so that the pixels in the top circular region are expanded to fill the first rectangular region. the bottom region can be mapped to a second rectangular area of the output video frame, so that the pixels in the bottom circular region are expanded to fill the second rectangular region.
公开号:BR112019019191A2
申请号:R112019019191
申请日:2018-03-21
公开日:2020-04-22
发明作者:Van Der Auwera Geert；Karczewicz Marta；Coban Muhammed
申请人:Qualcomm Inc；
IPC主号:

专利说明:

POLAR BALL PROJECTIONS FOR EFFICIENT 360 DEGREE VIDEO COMPACTION
FUNDAMENTALS [0001] Virtual reality (VR) describes a three-dimensional computer-generated environment within an apparently real or physical medium that can be interactive. In general, a user who experiences a virtual reality environment can turn left or right, look up or down and / or move forward and back, thus changing their view of the virtual environment. The 360-degree video presented to the user can change accordingly, so that the user experience is as integral as in the real world. The virtual reality video can be captured and rendered in truly high quality, which potentially provides an authentically immersive virtual reality experience.
[0002] To provide a full 360 degree view, video captured by a 360 degree video capture system is typically subjected to image stitching. Image stitching, in the case of 360-degree video generation, involves combining or merging video frames from adjacent cameras in the area where the video frames overlap or otherwise connect. The result would be an approximately spherical picture. Similar to a Mercator projection, however, the merged data is typically represented in a planar form. For example, pixels, in a merged video frame, can be mapped to planes of a cubic configuration or some other
Petition 870190092259, of 16/09/2019, p. 6/118
2/83 three-dimensional planar configuration (such as a pyramid, octahedron, decahedron, etc.). Video capture and video display devices generally work on a principle of rasterization - which means that a video frame is treated as a grid of pixels so square or rectangular planes are typically used to represent a spherical environment.
[0003] The 360 degree video can be encoded for storage and / or transmission. Video coding standards include the International Telecommunication Union (ITU) ITU-T H.261, the International Standards Organization / International Electronics Commission (ISO / IEC), the Moving Image Group (MPEG) MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H.263, ISO / IEC MPEG-4 Visual, ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC) , which includes Scalable Video Encoding (SVC) and Multi-View Video Decoding (MVC) and ITU-T H.265 (also known as ISO / IEC MPEG-4 HEVC) with their extensions.
SUMMARY SUMMARY [0004] In several implementations, techniques and systems for processing 360 degree video data are described to obtain better coding efficiency. These techniques and systems may include using a segmented spherical projection to divide a spherical representation of a 360-degree video frame into a north pole or top region, a south pole or bottom region and a middle or equatorial region. Then, the regions can be mapped to a two-dimensional rectangular shape that can be facilitated for manipulation by
Petition 870190092259, of 16/09/2019, p. 7/118
3/83 coding devices. In generating this mapping, the north and south circular pole regions of the segmented spherical projection can be expanded using various techniques to fill a rectangular region of the output video frame. When mapping the polar regions in all corners of a rectangular region, all available pixels in the output video frame can include usable data. A video frame generated in this way can be encoded more efficiently than video frames from 360-degree video that have been generated using other methods.
[0005] In several implementations, additional visual improvements can be achieved by applying a gradual sampling adjustment in certain areas of the output video frame. For example, any discontinuity between a rectangular region, in which a polar region has been mapped, and a rectangular region, in which part of the equatorial region has been mapped, can be reduced by applying a gradual change to the location in the video frame in which the samples are mapped. In this and other examples, the gradual change is applied to the rectangular region to a polar region of the spherical video data.
[0006] According to at least one example, a method for encoding video data is provided. In several implementations, the method includes obtaining 360 degree video data that includes a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. The method includes
Petition 870190092259, of 16/09/2019, p. 8/118
4/83 additionally segmenting a video frame from the plurality of video frames in a top region, a middle region and in a bottom region, the top region including a first circular area of the spherical representation, the region of bottom including a second circular area of the spherical representation that is opposite to the spherical representation of the first circular area, where the middle region includes an area of the spherical representation not included in the top or bottom region. The method further includes mapping the top region to a first rectangular area of the output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area. The method further includes mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[0007] In another example, a device is provided that includes a memory configured to store 360-degree video data and a processor. 360 degree video data can include a plurality of video frames, each video frame of the plurality of video frames includes a spherical representation of video data for the video frame. The processor is configured to and can segment a video frame from the plurality of video frames in a top region, a middle region and in a bottom region, the top region including a first circular area of the spherical representation, the lower region including a second circular area of the
Petition 870190092259, of 16/09/2019, p. 9/118
5/83 spherical representation that is opposite to the spherical representation of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region. The processor is configured for and can map the top region to a first rectangular area of the output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area. The processor is configured for and can map the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[0008] In another example, a computer-readable non-transient medium is provided that has instructions stored on it that, when executed by one or more processors, cause the one or more processors to perform operations, which include obtaining 360 video data degrees that include a plurality of video frames, each video frame of the plurality of video frames includes a spherical representation of video data for the video frame. The instructions can additionally cause one or more processors to perform operations that include segmenting a video frame from the plurality of video frames in a top region, a middle region and in a bottom region, the top region including a first circular area of the spherical representation, the bottom region including a second circular area of the spherical representation that is opposite to the representation
Petition 870190092259, of 16/09/2019, p. 11/108
6/83 spherical of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region. The instructions can additionally cause the one or more processors to perform operations that include mapping the top region to a first rectangular area of the output video frame, where mapping the top region includes the expanded video data included in the first area circular to fill the first rectangular area. The instructions can additionally cause one or more processors to perform operations that include mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second area circular to fill the second rectangular area.
[0009] In another example, an apparatus is provided that includes means for obtaining 360 degree video data that includes a plurality of video frames, each video frame of the plurality of video frames includes a spherical representation of video data for the video frame. The apparatus further comprises means for segmenting a video frame from the plurality of video frames in a top region, a middle region and in a bottom region, the top region including a first circular area of the spherical representation, the bottom region including a second circular area of the spherical representation that is opposite to the spherical representation of the first circular area, where the middle region includes an area of the spherical representation not included in the top or bottom region. The apparatus comprises
Petition 870190092259, of 16/09/2019, p. 11/118
7/83 additionally means for mapping the top region to a first rectangular area of the output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area. The apparatus further comprises means for mapping the bottom region to a second rectangular area of the output video frame, wherein mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[0010] In some respects, the video frame is segmented at a first latitude above an equator of the spherical representation and a second latitude below the equator, where the first latitude and the second latitude are equidistant from the equator, where the region top is above the first latitude and the region below is below the second latitude.
[0011] In some ways, mapping the top region and mapping the bottom region includes selecting a pixel location on the output video frame and determining a point on the spherical representation that corresponds to the pixel location, where the point on the spherical representation is determined using a mapping to convert a square into a circle. These aspects additionally include sampling a pixel from the point on the spherical representation and placing the sampled pixel at the pixel location. In some ways, mapping to convert a square to a circle minimizes distortion in the output video frame. In some ways, mapping the top region and
Petition 870190092259, of 16/09/2019, p. 11/12
8/83 mapping the bottom region also includes adjusting the pixel location using a gradual curve function. In some ways, the gradual curve function is used at pixel locations in an area adjacent to a third rectangular area in the video frame. In some ways, the gradual curve function changes pixel locations less towards an area in the middle of the first rectangular area or the second rectangular area and more towards an area outside the first rectangular area or the second rectangular area.
[0012] Sun ) some aspects, the methods, apparatus and means readable by computer include additionally map the region from the middle to one or more rectangular areas in a video frame aida. Under some aspects, The middle region includes a view left, a view in ahead and a right view, in which left view is located in video frame about to leave adjacent to view in ahead and where the view right is
located adjacent to the front view.
[0013] In some respects, the middle region includes a rear view, where the bottom region is located on the output video frame adjacent to the rear view, and where the top region is located adjacent to the rear view. back.
[0014] In some ways, mapping the top region in the first rectangular area includes applying a gradual adjustment to an area where the first rectangular area is adjacent to a third rectangular area in the output video frame and where to map the bottom region in the second rectangular area includes applying the adjustment
Petition 870190092259, of 16/09/2019, p. 11/13
9/83 gradual in an area where the second rectangular area is adjacent to a fourth rectangular area in the output video frame.
[0015] In some ways, the output video frame has a three-by-two aspect ratio.
[0016] According to at least one example, a method for encoding video data is provided. In several implementations, the method includes obtaining 360 degree video data that includes a plurality of video frames, each video frame of the plurality of video frames including a two-dimensional representation of video data for the video frame. The method further includes identifying a first rectangular area of a video frame from the plurality of video frames. The method further includes mapping the first rectangular area in a top region of a spherical representation of video data onto the video frame, where mapping the first rectangular area includes arranging the video data from the first rectangular area in the first circular area. The method further includes identifying the second rectangular area of the video frame. The method further includes mapping the second rectangular area to a region below the spherical representation, where the bottom region comprises a second circular area to the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area in the second circular area.
[0017] In another example, a device is provided that includes a memory configured to store 360-degree video data and a processor. The data from
Petition 870190092259, of 16/09/2019, p. 11/148
360 degree video can include a plurality of video frames, each video frame of the plurality of video frames including a two-dimensional representation of video data for the video frame. The processor is configured for and can identify a first rectangular area of a video frame from the plurality of video frames. The processor is configured to and can map the first rectangular area in a top region of a spherical representation of video data to the video frame, where the top region comprises a first circular area of the spherical representation and in which to map the first rectangular area includes arranging video data from the first rectangular area in the first circular area. The processor is configured for and can identify a second rectangular area of the video frame. The processor is configured to and can map the second rectangular area to a region below the spherical representation, where the bottom region comprises a second circular area to the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area in the second circular area.
[0018] In another example, a computer-readable non-transient medium is provided that has instructions stored on it that, when executed by one or more processors, cause the one or more processors to perform operations, which include obtaining the video data from 360 degrees, which includes a plurality of video frames, each video frame, from the plurality of video frames, including a two-dimensional representation of video data for the video frame. Instructions can
Petition 870190092259, of 16/09/2019, p. 11/15
11/83 additionally causing the one or more processors to perform operations, which include identifying a first rectangular area of a video frame from the plurality of video frames. The instructions can additionally cause one or more processors to perform operations, which include mapping the first rectangular area in a top region of a spherical representation of video data to the video frame, where mapping the first rectangular area includes arranging the video data from the first rectangular area first circular area. The instructions can additionally cause the one or more processors to perform operations that include identifying a second rectangular area of the video frame. The instructions can additionally cause one or more processors to perform operations that include mapping the second rectangular area to a lower region of the spherical representation, where the lower region comprises a second circular area of the spherical representation and where to map the second rectangular area includes arranging video data from the second rectangular area in the second circular area.
[0019] In another example, an apparatus is provided that includes means for obtaining 360 degree video data that includes a plurality of video frames, each video frame of the plurality of video frames including a two-dimensional representation of video data for the video frame. The apparatus further comprises means for identifying a first rectangular area of a video frame from the plurality of video frames. The apparatus additionally comprises
Petition 870190092259, of 16/09/2019, p. 11/168
12/83 means for mapping the first rectangular area in a top region of a spherical representation of video data to the video frame, where the top region comprises a first circular area of the spherical representation and in which to map the first area rectangular includes arranging video data from the first rectangular area in the first circular area. The apparatus further comprises means for identifying a second rectangular area of the video frame. The apparatus further comprises means for mapping the second rectangular area in a region below the spherical representation, where the bottom region comprises a second circular area for the spherical representation and in which mapping the second rectangular area includes arranging the video data of the second rectangular area in the second circular area.
[0020] In some respects, the top region includes a surface of the spherical representation above a first latitude of the spherical representation, where the bottom region includes a surface of the spherical representation below a second latitude of the spherical representation, where the first latitude and second latitude are equidistant from an equator of spherical representation.
[0021] In some ways, mapping to one or more rectangular areas includes selecting a point on the spherical representation and determining a pixel location in the video frame that corresponds to the point, where the pixel location is determined using a mapping to convert a three-dimensional sphere for two-dimensional rectangle. These aspects additionally include sampling a pixel from the pixel location and
Petition 870190092259, of 16/09/2019, p. 11/178
13/83 locate the sampled pixel at the point.
[0022] In some ways, mapping the first rectangular area and mapping the second rectangular area includes selecting a point on the spherical representation and determining a pixel location in the video frame that corresponds to the point, where the pixel location is determined using a mapping to convert a circle to a square. These aspects additionally include sampling a pixel from the pixel location and placing the sampled pixel at the point. In some ways, the mapping to convert a circle to a square reverses the distortion caused when the video data in the first rectangular area or the second rectangular area has been expanded to fill the first rectangular area or the second rectangular area. In some ways, mapping the first rectangular area and mapping the second rectangular area additionally includes adjusting the pixel location using a gradual curve function. In some ways, the gradual curve function is used to locate a pixel in an area adjacent to at least one of the one or more additional rectangular areas. In some ways, the gradual curve function changes the pixel locations less towards an area in the middle of the first rectangular area or the second rectangular area and more towards an area outside the first rectangular area or the second rectangular area.
[0023] In some respects, computer-readable methods, devices and means additionally include mapping one or more additional rectangular areas of the video frame in a region in the middle of the
Petition 870190092259, of 16/09/2019, p. 11/188
14/83 spherical representation. In some respects, one or more additional rectangular areas include a left view, a front view and a right view, where the left view is located adjacent to the front view and where the right view is adjacent to the front view.
[0024] In some respects, one or more additional rectangular areas include a rear view, in which the first rectangular area is adjacent to the rear view and in which the second rectangular area is adjacent to the rear view.
[0025] In some ways, mapping the first rectangular area in the top region includes applying a gradual adjustment to an area where the first rectangular area is adjacent to a third rectangular area from one or more additional rectangular areas and where to map the second rectangular area in the lower region includes applying a gradual adjustment to an area where the second rectangular area is adjacent to a fourth rectangular area, which forms one or more additional rectangular areas.
[0026] This summary is not intended to identify key or essential features of the claimed object, nor is it intended to be used in isolation to determine the scope of the claimed object. The object should be understood, by way of reference, in relation to the appropriate parts of the specification, to any or all of the drawings, and to each claim of this patent as a whole.
[0027] The precedent, together with other characteristics and modalities, will become more evident in
Petition 870190092259, of 16/09/2019, p. 11/198
15/83 reference to the specification below, claims and attached drawings.
BRIEF DESCRIPTION OF THE DRAWINGS [0028] The patent or application file contains at least one drawing executed in color. Copies of this patent publication or patent application with colored design (s) will be provided by the Office upon request and payment of the necessary fee.
[0029] The illustrative modalities of the present disclosure are described in detail below with reference to the following drawing figures:
[0030] Figure IA illustrates a video frame that includes an equirectangular projection of a 360 degree video frame.
[0031] Figure 1B illustrates a video frame that includes a cubic map projection of a 360 degree video frame.
[0032] Figure 2A is a diagram illustrating a segmented spherical projection of the surface of a sphere for vertical mapping.
[0033] Figure 2B is a diagram illustrating an exemplary video frame generated using a 3 x 2 layout of the mappings that can be generated using the segmented spherical projection.
[0034] Figure 3 is a diagram that illustrates a mapping of a circle to a square and a square to an exemplary circle.
[0035] Figure 4 is a diagram illustrating an exemplary output for various techniques for mapping a square to a circle and a circle to a
Petition 870190092259, of 16/09/2019, p. 11/20
16/83 square.
[0036] Figure 5A and Figure 5B are diagrams illustrating polar regions of exemplary spherical video data that were mapped using an angled fisheye projection.
[0037] Figure 6A and Figure 6B are diagrams that illustrate polar regions of exemplary spherical video data that have been mapped using the techniques discussed here.
[0038] Figure 7 illustrates an exemplary video frame generated by mapping a 360-degree video frame that uses a segmented spherical projection and techniques discussed here.
[0039] Figure 8 illustrates a first exemplary partial video frame that was mapped without using the gradual transition technique discussed above, and a second partial video frame that was mapped according to the gradual transition technique.
[0040] Figure 9 illustrates a graph on which the outputs of a function were plotted.
[0041] Figure 10 is a flow diagram that illustrates an exemplary process for processing video data according to the techniques discussed here.
[0042] Figure 11 is a flow diagram that illustrates an exemplary process for processing video data according to the techniques discussed here.
[0043] Figure 12 is a block diagram illustrating an exemplary coding device.
[0044] Figure 13 is a block diagram illustrating an exemplary decoding device.
Petition 870190092259, of 16/09/2019, p. 11/21
17/83
DETAILED DESCRIPTION [0045] Certain aspects and modalities of this disclosure are provided below. Some of these aspects and modalities can be applied independently and some of them can be applied in combination, as would be evident to those skilled in the art. In the description that follows, for explanatory purposes, specific details are established in order to provide a complete understanding of the modalities of the invention. However, it will be evident that different modalities can be practiced without these specific details. Figures and description are not intended to be restrictive.
[0046] The following description only provides examples and is not intended to limit the scope, applicability or configuration of the disclosure. Rather, the subsequent description of several examples will provide those skilled in the art with a description that allows implementing any of the examples. It should be understood that several changes in the function and arrangement of the elements can be made without departing from the spirit and scope of the invention, as established in the appended claims.
[0047] Specific details are given in the following description to provide a complete understanding of the examples. However, as will be understood by anyone skilled in the art, examples can be practiced without these specific details. For example, circuits, systems, networks, processes and other components can be shown as components in the form of a block diagram in order not to obscure the examples in detail
Petition 870190092259, of 16/09/2019, p. 11/22
18/83 unnecessary. In other cases, well-known circuits, processes, algorithms, structures and techniques can be shown without unnecessary details in order to avoid obscuring the examples.
[0048] Also, it should be noted that individual examples can be described as a process that is depicted as a flow chart, a flow diagram, a data flow diagram, a structure diagram or a block diagram. Although a flowchart can describe operations as a sequential process, many operations can be carried out in parallel or concurrently. In addition, the order of operations can be reset. A process is completed when its operations are completed, but there may be additional steps not included in a figure. A process can correspond to a method, a function, a procedure, a subroutine, a subprogram, etc. When a process corresponds to a function, its completion can correspond to a return of the function to the calling function or to the main function.
[0049] The term computer-readable means includes, but is not limited to, portable or non-portable storage devices, optical storage devices and various other means capable of storing, containing or carrying instruction (s) and / or data. A computer-readable medium may include a non-transitory medium in which data can be stored and which does not include carrier waves and / or transient electronic signals with wireless propagation or over wired connections. Examples of a non-transitory medium may include, but are not limited to, a magnetic disk or tape,
Petition 870190092259, of 16/09/2019, p. 11/23
19/83 optical storage such as compact disc (CD) or digital versatile disc (DVD), flash memory, memory or memory devices. A computer-readable medium may have stored machine-executable code and / or instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class or any other combination of instructions, data structures or program instructions. A code segment can be coupled to another code segment or to a hardware circuit by passing and / or receiving information, data, arguments, parameters or memory contents. Information, arguments, parameters, data, etc. they can be passed, directed or transmitted through any suitable means that includes memory sharing, message passing, token passing, network transmission or the like.
[0050] Furthermore, several examples can be implemented by hardware, software, firmware, middleware, microcode, hardware description languages or any combination of them. When implemented in software, firmware, middleware or microcode, the program code or segments of code to perform the necessary tasks (such as a computer program product) can be stored in a computer-readable or machine-readable medium . A processor (s) can perform the necessary tasks.
[0051] Virtual reality (VR) describes a three-dimensional computer-generated environment within an apparently real or physical medium that can be interactive.
Petition 870190092259, of 16/09/2019, p. 11/24
20/83
In some cases, a user who experiences a virtual reality environment uses electronic equipment, such as a head mounted monitor (HMD) and, optionally, also other items that can be used, such as gloves equipped with sensors, to interact with the virtual environment. As the user moves in the real world, the images rendered in the virtual environment also change, giving the user the perception that he is in motion within the virtual environment. In some cases, the virtual environment includes the sound related to the user's movements, giving the user the impression that the sounds originate from a specific direction or source. Virtual reality video can be captured and rendered in truly high quality, which potentially provides an authentically immersive virtual reality experience. Virtual reality applications include games, training, education, sports videos and online shopping, among others.
[0052] The 360 degree video is captured by video for display in a virtual reality environment. For example, a 360-degree video frame may include total 360 degrees visible from a given point, so that the frame includes pixels for all or part of a sphere centered on the point. 360-degree video data can also be referred to as a spherical video, because the 360-degree video captures a view in all directions, so that each 360-degree video frame can be viewed as a sphere of captured pixels. A 360-degree video frame can be computer generated and can be used to present environments
Petition 870190092259, of 16/09/2019, p. 11/25
Fictional 21/83. In some applications, video from the real world can be used to present a virtual reality environment. In these applications, a user can experience another place in the same environment where he can experience a present place. For example, a user can experience a walk around Berlin while using a 360-degree video system that is located in San Francisco.
[0053] A 360 degree video system may include a video capture device and a video display device, and possibly also other intermediate devices, such as servers, data storage and data transmission equipment. A video capture device can include a set of cameras, that is, a set of multiple cameras, each oriented in a different direction and capturing a different view. In a variety of applications, two to six cameras can be used to capture a full 360 degree view centered on the location of the camera set. Some video capture devices may use fewer cameras, such as, for example, video capture devices that capture side-by-side views first. A video includes frames, where a frame is an electronically encoded still image of a scene. The cameras capture a certain number of frames per second, which is referred to as the camera frame rate.
[0054] In some cases, to provide a full 360 degree view, the video captured by each of the cameras in the set of cameras is submitted to the
Petition 870190092259, of 16/09/2019, p. 11/26
22/83 sewing images. Image stitching, in the case of 360-degree video generation, involves combining or merging video frames from adjacent cameras in the area where the video frames overlap or otherwise connect. The result is an approximately spherical frame of video data. To integrate with existing video systems, the spherical video data frame can be mapped to a planar format. To produce an equirectangular shape, mapping techniques such as those used to generate Mercator projections can be used. As another example, the pixels in a merged video frame can be mapped over planes in a cubic shape or some other three-dimensional planar shape (such as a pyramid, an octahedron, a decahedron, etc.). Video capture and video display devices work on a principle of rasterization - which means that a video frame is treated as a grid of pixels - so square or rectangular planes are typically used to represent a spherical environment.
[0055] 360-degree video frames, mapped to a planar representation, can be encoded and / or compressed for storage and / or transmission. Encoding and / or compression can be performed using a video codec (such as a codec that is compatible with the High Efficiency Video Coding (HEVC) standard, also known as H.265, or a codec that compatible with the Advanced Video Encoding standard, which is also known as H.264, or another suitable encoding standard), which results in a bit stream
Petition 870190092259, of 16/09/2019, p. 11/278
23/83 of encoded and / or compressed video or group of bit streams. The encoding of video data using a video codec is described in further detail below.
[0056] In some implementations, the bitstream of encoded video can be stored and / or encapsulated in a media format or file format. The stored bit streams can be transmitted, for example, over a network, to a receiving device that can decode and render the video for display. Such a receiving device can be referred to herein as a video display device. For example, a 360-degree video system can generate encapsulated files from encoded video data (such as using an International Organization for Standardization (ISO) base media file format and / or derived file formats ). For example, the video codec can encode the video data and an encapsulation mechanism can generate the media files by encapsulating the video data in one or more media files of ISO format. Alternatively or in addition, the stored bit streams can be provided directly from a storage medium to a receiving device.
[0057] A receiving device can also implement a codec to decode and / or decompress an encoded video bit stream. In cases where encoded video bit streams are stored and / or encapsulated in a media or file format, the receiving device can support the media or file format that was used to package the video bit stream.
Petition 870190092259, of 16/09/2019, p. 11/28
24/83 video in a file (or files) and can extract the video data (and possibly also audio) to generate the encoded video data. For example, the receiving device can parse the media files with the encapsulated video data to generate the encoded video data, and the codec on the receiving device can decode the encoded video data.
[0058] The receiving device can then send the decoded video signal to a rendering device (such as, for example, a video display device, playback device or other suitable rendering device). Rendering devices include, for example, head-mounted monitors, virtual reality televisions and other 80 or 360 degree display devices. Generally, a head-mounted monitor is able to track the user's head movement and / or the user's eye movement. The head-mounted monitor can use the tracking information to render the portion of a 360-degree video that corresponds to the direction in which the user is looking, so that the user can experience the virtual environment in the same way that he would experience the real world. A rendering device can render a video at the same frame rate at which the video was captured or at a different frame rate.
[0059] Projections and mappings are used to represent three-dimensional (3-D) surfaces on two-dimensional (2-D) maps. For example, in 360-degree video applications, projections and mappings can be used to map a 360-degree video frame, which captures
Petition 870190092259, of 16/09/2019, p. 11/29
25/83 pixels in all directions from the camera and can thus be viewed as a sphere in a two-dimensional video frame. Examples of two-dimensional projections include an equirectangular projection (ERP) and a cubic map projection (CMP), among others. Figure 1A illustrates a video frame 110 that includes an equirectangular projection of a 360 degree video frame. An equirectangular projection maps points on a sphere to a two-dimensional map by linearly mapping the latitude and longitude of the points on the sphere to coordinates (x, y) in the video frame 110. The equirectangular projection is able to include all pixels from the 360-degree video frame on two-dimensional video frame 110, so transitions from one area of video frame 110 to another are integral. Integral transitions mean that an equirectangular video frame can be encoded efficiently in terms of the size of the encoded video frame. This is because operations, such as motion estimation and motion compensation, produce better results when the movement between video frames appears continuous.
[0060] Figure IB illustrates a 120 video frame that includes a cubic map projection of a 360 degree video frame. The cubic map projection projects points on the surface of a sphere to points on planes that are tangent to the surface of the sphere. That is, the pixels are adjusted on the six faces of a cube, where the height, width and length of the cube can be such that the cube fits in the sphere. The example in Figure IB is a 3 x 2 arrangement; that is, three faces of the cube
Petition 870190092259, of 16/09/2019, p. 11/30
26/83 crossed and two high cube faces. The 3 x 2 layout results in an aspect ratio that can be coded efficiently. For example, less data per line of pixels needs to be stored than if a layout such as 1 x 2 was used.
[0061] In the exemplary video frame 120 of Figure 1B, a face of the cube, which can be considered a front face 122, is located in the middle of the upper half of the video frame 120. The faces of the cube on the right and to the left of the front face (such as a right face 124 and a left face 126) are located on the right and left sides, respectively, of the top half of the video frame 120. The face of the cube that can be considered the back face 128 is rotated by -90 degrees and located in the center of the bottom half of the video frame 120. The face of the cube that can be considered the top or top 130 is located to the right of the back face 128 and it is also rotated, so that the edge of the top face 130 coincides with the edge of the back face 128. The face of the cube that can be considered the bottom or bottom face 132 is located to the left of the back face 128, and rotated to match the edge of the back face 128.
[0062] In the example of Figure 1B, the pixels included in the front face 122 were selected as the view to be located directly in front of the viewer. In other examples, a different part of the video data can be selected to be viewed from the front. In addition, the arrangement of the faces of the cube, illustrated in the exemplary video frame 120 of Figure 1B, is an exemplary arrangement. Other provisions are also
Petition 870190092259, of 16/09/2019, p. 11/318
27/83 possible.
[0063] A cubic map projection can be more compact than an equirectangular projection, due to some compression of pixel data that happens at the edges of the cube faces. The cubic map also produces less image distortion, which can improve the efficiency of the encoding.
[0064] Another projection is one referred to as the segmented spherical projection (SSP). Segmented spherical projection is described in Y. Ye, E. Alshina and J. Boyce, Descriptions of projection format conversion algorithm and video quality metrics in 36OLib, JVETE1003, jan. 2017, which is incorporated here, by way of reference, in its entirety and for all purposes. Figure 2A is a diagram illustrating the segmented spherical projection of the surface of a sphere 202 for an exemplary two-dimensional vertical mapping 210, generated according to the segmented spherical projection. The segmented spherical projection divides the sphere into three segments: a north pole segment 204, a south pole segment 208 and an equatorial segment 206. The north pole and south pole segments are also referred to here as spherical poles or spherical pole segments . In the illustrated example, the three segments are divided into a latitude of 45 degrees north and 45 degrees south (as, for example, as measured from the center of sphere 202). In other examples, the three segments can be divided into a different degree of latitude.
[0065] In the example, the two-dimensional mapping 210 illustrated in Figure 2A, the area covered by the
Petition 870190092259, of 16/09/2019, p. 11/28
28/83 north pole 204 is mapped to a first circular region, which will be referred to as a top view 214. Similarly, the area covered by the south pole segment 208 is mapped to a second circular region, which will be referred to as a view bottom 218. In this example, bottom view 218 is located on map 210 near, and below top view 214. Top view 214 and bottom view 218 are also marked as Face 0 and Face 1, respectively . Equatorial segment 206 is divided into four equal segments, and each segment is mapped to a square area, which are placed on map 210 below each other, below the bottom view 218. For the purposes of this example, the square areas for the region equatorial 206, from top to bottom, will be referred to as: the left view 216a, the front view 216b, the right view 216c and the rear view 216d or Face 2, Face 3, Face 4 and Face 5, respectively. In other examples, the left, right, front and rear views can be arranged in different orders than described here. In other examples, the areas on which the 206 equatorial segment is mapped may not be square. For example, when an angle other than 45 degrees is used to outline the polar regions, rectangular areas that are not square can better fit the pixel data and may result in less distortion than if, in this example, the data were mapped to square areas.
[0066] In a video application, the pixels, from each segment of the north pole 204 and segment of the south pole 208, can be mapped to the circular regions of the top view 214 and the bottom view 218,
Petition 870190092259, of 16/09/2019, p. 11/338
29/83 respectively, using an angular projection commonly known as fisheye projection. In this example, the diameter of the circular regions in each of the top views 214 and the bottom views 218 is the same as the edge of each of the equatorial segments, due to each 90 degree latitude view coverage. Each of the left views 216a, front view 216b, right view 216c and back view 216c can be generated using the projection used to generate the equirectangular projection, which can result in relatively smooth transitions between these views.
[0067] Figure 2B is a diagram illustrating an exemplary two-dimensional video frame 220 generated using a 3 x 2 array of mappings that can be generated using segmented spherical projection. In the exemplary video frame 220, the front view 216b is located in the middle of the upper half of the video frame 220. The left view 216a and the right view 216c are located on the left and right, respectively, of the front view 216b . The rear view 216d is rotated by -90 degrees and is located in the middle of the lower half of the video frame 220. The top view 212 is also rotated in such a way that the left edge of the top view is aligned with the right edge of the rear view 216d, and located to the right of rear view 216d. The bottom view 218 is also rotated so that the right edge of the bottom view 218 aligns with the left edge of the rear view 216d and is located to the left of the rear view 216d. In this example, aligning means that at least a few pixels from each view, which would be adjacent on the sphere
Petition 870190092259, of 16/09/2019, p. 11/34
Original 30/83 202, are adjacent in video frame 220. In this example, the corner areas of top view 212 and bottom view 218, which are outside the fisheye projection, are filled with a gray color. In other examples, these corner areas can be filled with another color.
[0068] Segmented spherical projection can generate video frames that have better coding efficiency than the generated video frames that use equirectangular projection or cubic map projection. For example, they can result in less distortion and smoother transitions when using segmented spherical projection, both of which can improve coding efficiency.
[0069] Nevertheless, an even better coding efficiency can be achieved. For example, in the top and bottom views, the corner areas do not capture pixel data and thus add data to a video frame that is not needed when displaying the contents of the video frame. This extra data can, in some cases, also result in an encoded video frame that is larger than when using a cubic map projection, while providing the same number of active pixels.
[0070] Segmented spherical projection also introduces some new problems. For example, the circumference of the top and bottom views is shorter than the combined width of the left, front, right and back views. This difference can result in a visible boundary when views are brought together to
Petition 870190092259, of 16/09/2019, p. 11/35
31/83 exhibition, between the north pole region and the equatorial region, and the equatorial region and the south pole region.
[0071] In several implementations, systems and methods are provided for processing 360 degree video data, using a segmented spherical projection, which avoid the problems discussed above. In several implementations, segmented spherical projection can be used to map a 360-degree video frame to a two-dimensional rectangular shape, which may be easier for manipulation by video transmitters and receivers. In generating this mapping, the circular regions of the north and south pole of the segmented spherical projection can be expanded, using various techniques, to fill a rectangular region of the output video frame. When mapping the polar regions in all corners of a rectangular region, all available pixels in the output video frame can include usable data. Therefore, the perimeter of the top and bottom views can be made equal to the total length of the combined left, front, right and back views, thereby reducing any distortion or artifacts at the edges in the top and bottom views. Furthermore, the additional pixels resulting from the expansion of the polar regions can result in a denser sampling of pixels in the polar regions and thus a more accurate representation in these areas.
[0072] As noted above, the equatorial region of the segmented spherical projection can be mapped to one or more square or rectangular areas of the output video frame using techniques, such as those
Petition 870190092259, of 16/09/2019, p. 36/118
32/83 that can be used to generate an equirectangular projection. The equatorial region can also be mapped using other projections, such as cylindrical projections of equal area. The use of cylindrical projections of equal area is further discussed in U.S. Patent No. (power of attorney number 173550), filed in, which is incorporated herein by reference in its entirety.
[0073] Circle-to-square mapping techniques can be used to map polar regions of the segmented sphere projection to square or rectangular areas of the output video frame. Figure 3 is a diagram illustrating an example of mapping a 302 circle to a 304 square and from a 304 square to a 302 circle. Several techniques can be used to perform these mappings, some of which are described in M. Lambers, Mappings between Sphere, Disc and Square, Journal of Computer Graphics Techniques, vol. 5, No. 2, 2016, which is incorporated here, by way of reference, in its entirety and for all purposes.
[0074] Figure 4 is a diagram illustrating the exemplary output for various techniques for mapping a 404 square to a circle and from a 402 circle to a square. The techniques illustrated include radial elongation 412, Shirley 414 equal area mapping, Fernández-Gausti 416 square mapping (which will be referred to here as the square mapping), elliptical arc mapping 418 and conformational mapping 420. These and other techniques can produce varying degrees of distortion in different parts of the output mapping. On a
Petition 870190092259, of 16/09/2019, p. 37/118
33/83 video application, techniques that result in the least amount of modification of the original image are used, such as square mapping 416 or elliptical arc mapping 418. Maintaining most of the original image as much as possible , can be advantageous for coding efficiency.
[0075]
Nevertheless, any of the techniques discussed in Lambers, and many other techniques, can be used to map a circle to a square. The square mapping and elliptical arc mapping will be used as examples to illustrate the use of segmented spherical projection for mapping 360 degree video data to a two-dimensional rectangular shape. In other examples, other square-to-circle mapping techniques can be used.
[0076] square mapping provides square-to-circle mapping using the following equations:
li - X ^ x ² + y ² —x ² y ² (2) [0077]
In equations (1) and (2), (x, y) are Cartesian coordinates within the square and (u, v) are Cartesian coordinates within the circle.
[0078] elliptical arc mapping provides square-to-circle mapping using the following equations:
Petition 870190092259, of 16/09/2019, p. 11/38
34/83 ______
U - X 1-- (3)
LX ² v = y 11 - ~ (4) [0079] Mapping a 360-degree video frame to a two-dimensional rectangular shape involves converting it from the three-dimensional space of the 360-degree video data to the two-dimensional space of the frame. output video. Performing this conversion may include selecting a pixel location (m, n) in the output video frame and determining a point (Φ, Θ) on the spherical video data. A pixel sample can be taken from the point designated (Φ, Θ) and located at the point (m, n) in the output video frame.
[0080] In some examples, such as those discussed above, the polar regions of the spherical data can be mapped to a rectangular area of the output video frame. In these examples, the dimensions of the square can be denoted as A x A. In other examples, the length and width of the rectangular area may be different from each other.
[0081] Pixel locations in a video frame are usually given in a raster order, with the pixel position of number zero in the top left corner of the video frame. Thus, a first step in converting a 3-D space to a 2-D space is to renormalize the coordinates (m, n) in the video frame to Cartesian coordinates (x, y). This can be done using the following equations:
Petition 870190092259, of 16/09/2019, p. 39/118
35/83 f1 x = - (m + -I - 1 (5)
A 2 / / 1 .
y = t («+ ~) -i«>) / 4 2 / [0082] In one example, equations (5) and (6) can be combined with square-paracircle equations, provided by square-mapping , to determine the Cartesian coordinates (u, v) in a circle:
s - ₍₇₎ g'x ^ + y ²
THE
U = “SX (8) (9) [0083] In another example, equations (5) and (6) can be combined with the square-to-circle equations provided by elliptical arc mapping to determine (u, v):
(10) (11) [0084] The next step is to determine the three-dimensional polar coordinates that correspond to (u, v), as determined in any of the above examples, or to use another method for a square-to-circle mapping. As shown in Figure 2A, three-dimensional polar coordinates include a radius, an equatorial angle Φ (such as an angle along
Petition 870190092259, of 16/09/2019, p. 11/40
36/83 of an equator of the sphere from a point selected to be zero degree) and a vertical angle Θ (such as an angle between the equator and one of the poles). The polar coordinates for the north pole region (face 0 in the example in figure 2A), can be determined using the following equations:
, .. -ί ΛΑ φ - tan ¹ 1 - (12)
XV /
[0085] The polar coordinates for the region of
south pole (face 1 in the example of the determined using the following figure 2A)equations: can be φ = tan ' ¹ β) + (14)II| SjC7-7 (15)[0086] Both for the region of North Pole
as for the south pole region, ^{- v} [0087] Figure 5A and Figure 5B are diagrams that illustrate examples of the polar regions of spherical video data that were mapped using an angled fisheye projection, which can also be described as a circular polar mapping. Figure 5A illustrates a bottom view 508, obtained by mapping the south pole region. Figure 5B illustrates a top view 504, obtained by mapping the north pole region.
[0088] As discussed above, a fisheye projection results in pixels from the regions of the north pole
Petition 870190092259, of 16/09/2019, p. 41/118
37/83 and south occupy a circular area 522, 524 within the square areas 526, 528 in which the pixels are mapped. The projection is able to preserve most of the data from the spherical video data, although some losses can happen due to the pixels being distorted in a circular shape. In addition, each of the square regions has corner areas 530 where the pixels are filled with gray or some other value instead of pixel data from the spherical video data. When encoded, corner areas 530 can reduce the encoding efficiency due to having non-video data. In addition, corner areas 530 add unnecessary data, since data from corner areas 530 will be discarded when the video frame is brought together for display.
[0089] Figure 6A and Figure 6B are diagrams that illustrate examples of the polar regions of spherical video data that were mapped using the equations discussed above. Figure 6A shows a bottom view 608 and Figure 6B shows a top view 604. The bottom view 608 and top view 604 each start with the same data that was mapped to the views shown in Figure 5A, in Figure 5B. In Figure 6A and Figure 6B, however, applying the above equations results in the data to be extended to fit the corner regions 630 of each square area 626, 628. In these examples, no unnecessary data is added to the video frame of output. In addition, more pixel data, from the spherical video data, can be preserved by extending the data to the 630 corner regions, rather than deforming
Petition 870190092259, of 16/09/2019, p. 42/118
38/83 the data in the circular region.
[0090] Figure 7 illustrates an example of a video frame 710 generated by mapping a 360 degree video frame using a segmented spherical projection and the equations discussed above. The exemplary video frame 710 includes a 3 x 2 arrangement of the left, front, right, bottom, back and top views. In the upper half of the video frame 710, the left view 726, the front view 722 and the right views 724 are arranged side by side to form a continuous region. In the lower half of the video frame 710, the rear view 728 is rotated by -90 degrees and is located in the middle. In this example, data for top view 730 is rotated 45 degrees before being mapped to the square area to the right of back view 728. Bottom view 732 is similarly rotated 45 degrees before being mapped to square area to the left of the rear view 728.
[0091] The arrangement of the bottom view 732, the rear view 728 and the top view 730 on the bottom half of the video frame 710 results in an almost continuous region. Smooth transitions between each view are desirable because encoding the video frame can result in a more compact encoded representation than when
transitions are abrupt. In other examples, other provisions of views can to be used, such as a disposition in 1 x 6 or an 6 x 1 layout.
Alternatively or additionally, in other examples, the top and bottom views can be located at the top or bottom of the video frame 710, on the left or right,
Petition 870190092259, of 16/09/2019, p. 43/118
39/83 or elsewhere in the video frame 710. Alternatively or in addition, other rotations of the top and bottom views can be applied before the top and bottom views are mapped to the video frame, to achieve different almost continuous regions.
[0092] The continuity between the pixels in the 710 video frame can result in better coding efficiency and can also reduce the occurrence of visible artifacts or defects when the 710 video frame is designed for display. In the exemplary video frame 710 of Figure 7, some discontinuity is evident when the corner regions 750 of the top view 730 and the bottom view 732 meet with the rear view 728. This discontinuity may be due to the different methods that are being used to produce the rear view 728 and the top and bottom views and / or the differences in the shape of the data being taken from the spherical video frame.
[0093] The discontinuity caused by the 750 corner regions can be reduced by gradually and continuously adjusting the pixel sampling from the spherical video data. Taking, for example, top view 730, samples can be taken from the spherical data gradually from the edge adjacent to back view 728 to (in this example) to the right of top view 730. In addition, the sampling adjustment can be applied more towards the edges of outside the top view 730 (as, for example, for corner regions 750) than towards the middle, where discontinuity with the rear view is less evident. The same adjustments can be applied when
Petition 870190092259, of 16/09/2019, p. 44/118
40/83 mapping of the bottom view 732.
[0094] In several implementations, the gradual sampling of pixels includes adjusting the conversion from 2-D to 3-D discussed above. For example, Cartesian coordinates (x, y) that correspond to a point (m, n) selected in the video frame can be determined using the following equations:
[0095] As before, A is the length of the side of the square area on which the spherical video data is being mapped.
[0096] For the top view (such as, for example, face 0), the x coordinate can be adjusted according to the following equations:
x '- 1 4- tanh í —J ₍₁₈₎ b J x - tan (tan ~ ¹ (x') x) / x '(.1 9) [0097] For the rear view (such as , face 1), the x coordinate can be adjusted according to the following equations:
(V ““
----- (20) b / x - tan (tan ¹ (x ') x) / x' (2i) [0098] In the above equations, b is a parameter that can be used to vary the amount by which
Petition 870190092259, of 16/09/2019, p. 45/118
41/83 pixel sampling changes from the edge of the top or bottom view towards the middle of the view. The selection of a value for b is discussed below. In some examples, a value of 0.2 to b reduces the discontinuity between the rear view and the top view without significantly affecting the sampling of pixels in the central area of the top view.
[0099] Note that in the example of Figure 7, because the rear view 728 is rotated by -90 degrees, the x axis is from top to bottom and the y axis from left to right.
[0100] In several implementations, the y coordinate is not adjusted, as determined by equation (17). In addition, the hyperbolic tangent function is used as an exemplary function that produces a gradual transition between 0 and 1. In other examples, other functions, such as sine, polynomial functions or other functions that produce a gradual transition, can be used.
[0101] In several implementations, the adjusted x value and its value can additionally be used in the square-to-circle mapping discussed above. Figure 8 illustrates an example of a first partial video frame 810 that was mapped without using the gradual transition technique discussed above, and a second partial video frame 820 that was mapped according to the gradual transition technique. In this example, a top view 830 appears at the top of each partial video frame, a bottom view 832 is below the partial video frames and a back view 828 is in the middle.
Petition 870190092259, of 16/09/2019, p. 46/118
42/83 [0102] In the first partial video frame 810, for example, several discontinuities 850 are circumvented. These discontinuities 850 occur where top view 830 and bottom view 832 meet rear view 828. Discontinuities 850 appear as a horizontal line, which may be present when the video frame is presented for display.
[0103] In the second partial video frame 820, the discontinuity area 852 is also highlighted, but because the adjusted x-coordinate equation was used to map the top view 830 and the bottom view 832, the discontinuity is less evident .
[0104] Note that, in this example, the geometric axis x is in the horizontal direction. Note also that, in this example, the adjustment to the x coordinate is only applied when the top view 830 and the bottom view 832 meet the rear view 828 and not at the top and bottom edges of the second frame. partial video 820. In other examples, the adjustment can also be applied to the top and bottom edges of the frame.
[0105] As noted above, a parameter b is used in equations (18) and (19) to adjust the degree to which the x coordinate changes as x decreases or increases. A higher value for b can result in a more abrupt transition from, for example, the rear view to the top view (resulting in a possibly visible discontinuity), and a lower value for b can result in a smoother transition. A lower value for b, however, can cause more pixels towards the center of the view to be affected. Limit
Petition 870190092259, of 16/09/2019, p. 47/118
43/83 modifying the pixels in the center of the view may be desirable, because keeping these pixels as close as possible to the orientation of the pixels in the spherical video data can result in both better encoding efficiency and better appearance when the video frame is displayed .
[0106] As an example of the effect of different values for b, Figure 9 illustrates a graph 900 in which (x ' _r y) is plotted according to equation (18) with values other than b. Where the top view meets the rear view, y = -1, and the edge of the top view that is opposite the rear view is at y - 1. At x '- 0, gradual sampling is disabled, which means, for example, that square or elliptical arc mapping applies. At x '= 1, sampling takes place mathematically with sampling the edge of the rear view. The leftmost plot on graph 900 stands for b = 0.2. Each successive plot, moving from left to right, increases b by 0.1. The rightmost plot is at b = 2.0.
[0107] As can be seen from the example illustrated in Figure 9, as b increases, the sampling adjustment increasingly affects the mapping of the top view, until all pixels in the view are affected. As noted earlier, modifying samples in the mid-view area can adversely affect the encoding efficiency of the output video frame. At lower values of b, however, the change to y decays quickly, resulting in the adjustment being limited to a certain area.
Petition 870190092259, of 16/09/2019, p. 48/118
44/83 [0108] A video frame produced according to the techniques discussed above can be encoded for storage and / or transmission. The video frame can subsequently be decoded for display. To display the video frame, the pixels in the video frame can be mapped from the two-dimensional arrangement of the rear video frame to a three-dimensional representation, for example, as spherical data. The reconstructed spherical data can then be displayed, for example, using a virtual reality capable display device.
[0109] To reconstruct the spherical video data, inverse operations from those used to map the video frame can be applied. For example, the left, front, right and back views from the video frame can be mapped back to the equatorial region of the segmented sphere projection using, for example, the inverse of the projection used to generate an equirectangular projection. The top and bottom views can be mapped back to the north and south pole regions by selecting a point (Φ, Θ) on the sphere (such as a horizontal angle and a vertical angle; the radius will be constant) and determining a corresponding point (m, n) in the video frame. A pixel sampling from the point (m, n) can then be located at (Φ, Θ).
[0110] In the examples that follow, the circle-to-square mappings provided by the square mapping and the elliptical arc mapping will be used as examples of techniques that
Petition 870190092259, of 16/09/2019, p. 11/49
45/83 can be used to convert the pixel data stored in a square area of the video frame into a circular area. In other examples, other circle-to-square mapping techniques can be used.
[0111] For the top view (such as, for Q ç. Yí] example, face 0), 4'2.1 θ Φ ^e t To convert polar coordinates (Φ _r Θ) into Cartesian coordinates (u, v), the following equations can be used for the top view:
(- ~ &} sin φ
U - ---- --7 (22) (7Γ -Θ 1cosφ u - --- ~ -e (23) [0112] In the bottom view (such as π π f] u £! “J I (AP (—TC 7Τ1 example, face 1),% ² and ^{ψ k J.} To convert polar coordinates (Φ _r Θ) into Cartesian coordinates (u, v), the following equations can be used for the view of low:
(- + 6 ¹ ) sin $
U = —-- fc ---- (24) (- + 0) cos φ
V - '' τΐ (25) [0113] Then, given a point (u, v), a corresponding location (x, y) in the video can be determined. As a first example, quadracircle mapping provides the following equations for performing a circle-to-square mapping:
Petition 870190092259, of 16/09/2019, p. 50/118
46/83
W - —jú— U ^Z + V ² - J (u ² ψ ^ 2) (^ 2 _{+ v} 2 Z 4 _{1 £} 2ρ2 ^ (2 (5)
V 2 N ^v (x, y) = fw w , _£ II p. (-, - if W> 0 XV uJ (u, v), otherwise (27} [0114] In equation (26), sgn is the sign function.
[0115] As a second example, elliptical arc mapping provides the following equations for performing a circle-to-square mapping:
x - | V2 + u ² - v ² + 2 / 2u - | V2 + u ² - v ² - 2V2u (28) y - 15/2 - u ² + v ² + 2> / 2v - 17 * 2 - w ² + v ² - 2y / 2v (29) [0116] Finally, the coordinates (x, y) can be denormalized to the video frame coordinate system. As noted above, (x, y) are Cartesian coordinates, although the video frame can use the upper left corner as a point (0, 0). The conversion to the coordinates (m, n) of the video frame can be determined using the following equations:
m - | (x + 1) - 0.5 (30) n = ~ (y + 1) -0.5 (3i) [0117] A location (m, n) determined using either square mapping or elliptical arc mapping (or another technique) can be used to select a pixel from the video frame. The pixel can then be mapped to the point (Φ, Θ) on the
Petition 870190092259, of 16/09/2019, p. 51/118
47/83 spherical representation of the video frame.
[0118] In several implementations, the gradual adjustment of some parts of the video frame may have been applied to reduce the visible distortion that is caused by unaligned pixels at the boundaries between views. For example, the gradual adjustment may have been made using the techniques discussed above. In these examples, the x coordinate can be adjusted before converting (x, y) to (m, n), using the following equations:
x '= 1 + tanh f-4—1) ₍₃₂ ) b / x = tan ^_1 (x'x) / tan ^_1 (x') (33) [0119] As noted earlier, a generated video frame that uses a combination of segmented sphere mapping and a quad or elliptical arc mapping that can be encoded more efficiently than a video frame generated using only segmented sphere mapping. For example, for the same number of two-dimensional map samples, quadracircle mapping can outperform the spherical projection segmented by approximately 1% by common test conditions, as described in J. Boyce, E. Alshina, A. Abbas, Y. Ye, JVET common test conditions and evaluation procedures for 360-degree video, JVETE1030, which is incorporated here for reference purposes, in its entirety and for all purposes.
[0120] Figure 10 is a flow diagram that illustrates an exemplary process 1000 for processing video data according to the techniques discussed above.
Petition 870190092259, of 16/09/2019, p. 11/118
48/83
In 1002, process 1000 includes obtaining 360 degree video data, which includes a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame. In some instances, 360-degree video data can be obtained directly from a video capture device. In these examples, the spherical representation can include multiple images that have been captured simultaneously, such as multiple rectangular images or one or more fisheye images. Alternatively or in addition, 360-degree video data can include video frames in which multiple images have been stitched together by the video capture device or other device. In some examples, 360-degree video data obtained in a rectangular format (such as an equatorial or cubic map format) can be mapped to a spherical representation.
[0121] In 1004, process 1000 includes segmenting a video frame from the plurality of video frames in a top region, a middle region and a bottom region. The top region includes a first circular area of the spherical representation. The lower region includes a second circular area of the spherical representation that is opposite to the spherical representation of the first circular area. The middle region includes an area of the spherical representation not included in the top or bottom region. The video frame can be segmented at a first latitude above an equator of the spherical representation and a second latitude below the equator. THE
Petition 870190092259, of 16/09/2019, p. 53/118
49/83 first latitude and second latitude can be equidistant from the equator. In some examples, the latitude angle is 45 degrees from the equator. In other examples, the angle of the latitudes is greater than or less than 45 degrees.
[0122] In some implementations, process 1000 includes mapping the middle region to one or more rectangular areas of an output video frame. Mapping the middle region can include, for example, selecting a pixel location in the output video frame and determining a point on the spherical representation that corresponds to the pixel location. In this example, the point on the spherical representation can be determined using a mapping to convert a two-dimensional rectangle to a three-dimensional sphere, such as an equirectangular projection. Mapping the middle region can also include sampling a pixel at the point of the spherical representation and placing the sampled pixel at the pixel location in the video frame.
[0123] In 1006, process 1000 includes mapping the top region into a first rectangular area of the output video frame. Mapping the top region can include expanding the video data included in the first circular area to fill the first rectangular area, so that the first rectangular area has no pixel locations that do not include pixels from the video frame.
[0124] In 1008, the 1000 process includes map the region under in a second rectangular area of the frame of video about to leave . Map the bottom region can include expand the data of video, that include the second area
Petition 870190092259, of 16/09/2019, p. 54/118
50/83 circular, to fill the second rectangular area.
[0125] Mapping the top region and mapping the bottom region may include, for example, selecting a pixel location in the output video frame and determining a point on the spherical representation that corresponds to the pixel location. In this example, the point on the spherical representation can be determined using a mapping for converting a square to a circle, such as a square mapping or an elliptical arc mapping or other mapping. Mapping the top and bottom regions can also include sampling a pixel from the point on the spherical representation and placing the sampled pixel at the pixel location in the output video frame.
[0126] In some examples, mapping to convert a square to a circle minimizes distortion in the output video frame. In these examples, the central area of the first and second rectangular areas includes a direct mapping of the spherical representation to the output video frame, such that small distortion results in this area.
[0127] In some examples, the mapping of the top and bottom regions may additionally include adjusting the pixel location using a gradual curve function. The gradual curve function can be used, for example, at pixel locations in an area adjacent to at least one of the one or more rectangular areas. For example, when the first rectangular area is adjacent to another rectangular area (such as, for example, one of the rectangular areas for the middle region),
Petition 870190092259, of 16/09/2019, p. 55/118
51/83 the gradual curve function can be applied. As another example, where the second rectangular area is adjacent to the other rectangular area, the gradual curve function can be applied. The application of the gradual curve function can reduce the distortion that can appear where the first and second rectangular areas meet other rectangular areas in the video frame. The gradual curve function can change pixel locations less towards an area in the middle of the first rectangular area or the second rectangular area, and more towards an area outside the first rectangular area or the second rectangular area. Examples of gradual curve functions include hyperbolic tangent, sine, polynomial functions and other functions.
[0128] In some examples, the middle region includes parts that can be designated as a left, a front and a right view. In these examples, the part designated as the left view can be located on the output video frame adjacent to the part designated as the front view. In addition, the part designated as the right view is located adjacent to the front view. In these examples, the left, front and right views can form a continuous area in the output video frame, where the continuous means that the pixels that are adjacent in the spherical representation are located adjacent to each other in the output video frame.
[0129] In some examples, the middle region includes a part that can be designated as a rear view. In these examples, the bottom region can be located on the output video frame adjacent to the part designated as the rear view, and the top region can also be
Petition 870190092259, of 16/09/2019, p. 56/118
52/83 located adjacent to the rear view. In these examples, the bottom region and the top region can form an area in the output video frame that is substantially continuous.
[0130] In some examples, mapping the top region to a first rectangular area may include applying a gradual adjustment to an area where the first rectangular area is adjacent to a rectangular area of one or more rectangular areas. For example, where the first rectangular area is adjacent to another rectangular area, the sampling of pixels from the spherical video data can be shifted to better align with the pixels from another rectangular area. This gradual adjustment can be gradually decreased for pixel locations that are farthest from the edge of the first rectangular area. In some examples, the same gradual adjustment can be applied to the second rectangular area.
[0131] In some examples, the output video frame has a three-by-two aspect ratio. A three-by-two aspect ratio can be coded more efficiently than other proportions. In some examples, the output video frame may be encoded, using, for example, the HEVC or AVC codec (or another codec) for storage and / or transmission.
[0132] Figure 11 is a flow diagram illustrating an exemplary 1100 process for processing video data according to the techniques discussed above. In 1102, the 1100 process includes obtaining 360 degree video data, which includes a plurality of video frames, each video frame from the plurality of video frames, which includes a two-dimensional representation of data
Petition 870190092259, of 16/09/2019, p. 57/118
53/83 video for the video frame. In some examples, 360 degree video data can be obtained from an encoded bit stream. The encoded bit stream may have been read from a storage location and / or may have been received from a transmission. In these examples, the bit stream can be decoded into rectangular video frames.
[0133] In 1104, process 1100 includes identifying a first rectangular area of a video frame from the plurality of video frames. In 1106, process 1100 includes mapping the first rectangular area in a top region of a spherical representation of video data to the video frame. The top region can comprise a first circular area of the spherical representation. Mapping the first rectangular area can include arranging video data from the first rectangular area in the first circular area.
[0134] In 1108, process 1100 includes identifying a second rectangular area of the video frame. In 1110, process 1100 includes mapping the second rectangular area in a region below the spherical representation. The bottom region can comprise a second circular area of the spherical representation. Mapping the second rectangular area can include arranging the video data from the second rectangular area in the second circular area.
[0135] The top region can include, for example, a surface of the spherical representation that is above a first latitude of the spherical representation. As another example, the bottom region can include a surface of the spherical representation below a second
Petition 870190092259, of 16/09/2019, p. 11 588
54/83 latitude of the spherical representation. In this example, the first latitude and the second latitude can be equidistant from an equator of the spherical representation. In some instances, latitudes are at 45 degrees from the equator. In some instances, latitudes are degrees greater than or less than 45 degrees.
[0136] In some examples, mapping the first rectangular area and mapping the second rectangular area includes selecting a point on the spherical representation and determining a pixel location in the video frame that corresponds to the point. The pixel location can be determined using a mapping to convert a circle to a square, such as a square mapping, an elliptical arc mapping or other mapping. These mappings can result in a circle being compressed or made into a square. Mapping the first and second rectangular areas can additionally include sampling a pixel from the pixel location and placing the sampled pixel at the point on the spherical representation.
[0137] In some examples, mapping to convert a circle to a square reverses the distortion caused when the video data in the first rectangular area or the second rectangular area has been expanded to fill the first rectangular area or the second rectangular area. For example, the first and second rectangular areas may have been filled with pixel data by converting a circular region of a spherical representation of the 360-degree video into a rectangular region, which may result in some visible distortion of the pixels.
Petition 870190092259, of 16/09/2019, p. 59/118
55/83
By mapping pixels from a rectangular shape back to a circular shape, the distortion can be removed.
[0138] In some examples, mapping the first rectangular area and mapping the second rectangular area includes additionally adjusting the pixel location using a gradual curve function. For example, the gradual curve function can be used to locate a pixel in an area adjacent to at least one of the additional rectangular areas. In these examples, a seamless transition between pixels adjacent to the first or second rectangular areas and pixels in the first or second rectangular areas can be preserved when the pixels are mapped to the spherical representation. In some instances, the gradual curve function changes the pixel locations less for an area in the middle of the first rectangular area or the second rectangular area, and more towards an area outside the first rectangular area or the second rectangular area.
[0139] In some implementations, process 1100 includes mapping one or more additional rectangular areas of the video frame to a region in the middle of the spherical representation. Mapping one or more additional rectangular areas can include, for example, selecting a point on the spherical representation and determining a pixel location in the video frame that corresponds to the point. The pixel location can be determined using a mapping to convert a three-dimensional sphere into a two-dimensional rectangle, such as an equirectangular projection, a cubic map projection or other projection. Mapping one or more additional rectangular areas can additionally include the
Petition 870190092259, of 16/09/2019, p. 60/118
56/83 sampling a pixel from the pixel location and placing the sampled pixel at the point on the spherical representation.
[0140] In some examples, the one or more additional rectangular areas include areas that can be designated as a left view, a front view and a right view. In these examples, the area designated as the left view can be located adjacent to the area designated as the front view and the area designated as the right view can also be located adjacent to the front view. In these examples, the left, front and right views can form a continuous area in the video frame.
[0141] In some examples, one or more additional rectangular areas include an area that can be designated as a rear view. In these examples, the first rectangular area can be adjacent to the area designated as the rear view, and the second rectangular area can also be adjacent to the rear view. In these examples, the first rectangular area, the rear view and the second rectangular area can form a continuous area in the video frame.
[0142] In some examples, mapping the first rectangular area in the top region may include applying a gradual adjustment to an area where the first rectangular area is adjacent to another rectangular area. In these examples, the pixel locations in the video frame may have been shifted, so that a continuous transition between the first rectangular area and the other rectangular area is produced. This continuous transition can be
Petition 870190092259, of 16/09/2019, p. 61/118
57/83 preserved in the spherical representation by applying gradual adjustment when pixels are mapped from the video frame to the spherical representation. A similar gradual adjustment can also be applied to the second rectangular area.
[0143] In some examples, processes 1000, 1100 can be performed by a computing device or an apparatus, such as a video encoding device (such as, for example, the encoding device 104 and / or the decoding device 112). A video encoding device can include, for example, a video encoding system and / or a video decoding system. In some cases, the computing device or device may include a processor, microprocessor, microcomputer, or other component of a device that is configured to conduct process steps 1000, 1100. In some instances, the computing device or device may include a camera configured to capture video data (such as a video stream) that includes video frames. For example, the computing device can include a camera device (such as an IP camera or other type of camera device) that can include a video codec. In some instances, a camera or other capture device that captures video data is separate from the computing device, in which case the computing device receives the captured video data. The computing device may additionally include a network interface configured to communicate video data. The network interface can be configured to communicate data based on
Petition 870190092259, of 16/09/2019, p. 62/118
58/83 in Internet Protocol (IP).
[0144] Processes 1000, 1100 are illustrated as logic flow diagrams, whose operation represents a sequence of operations that can be implemented in hardware, computer instructions or a combination of them. In the context of computer instructions, operations represent computer executable instructions, stored in one or more computer-readable storage media that, when executed by one or more processors, perform the listed operations. Generally, computer executable instructions include routines, programs, objects, components, data structures and the like, that perform specific functions or implement specific types of data. The order in which the operations are described is not intended to be interpreted as a limitation, and any number of the operations described can be combined in any order and / or in parallel to implement the processes.
[0145] In addition, processes 1000, 1100 can be performed under the control of one or more computer systems configured with executable instructions and can be implemented as code (such as, for example, executable instructions, one or more computer programs or one or more applications) collectively run on one or more processors, by hardware or combinations thereof. As noted above, the code can be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program that comprises a plurality of instructions executable by one or more processors. The middle
Petition 870190092259, of 16/09/2019, p. 63/118
59/83 computer-readable or machine-readable storage may be non-transitory.
[0146] Video data captured by a camera (such as a fish-eye camera or other suitable camera or cameras) can be encrypted to reduce the amount of data needed for transmission and storage. Encoding techniques can be implemented in an example of a video encoding and decoding system. In some examples, a system includes a source device that provides encoded video data to be decoded at a later time by a destination device. Specifically, the source device provides video data to the destination device via a computer-readable medium. The source device and the target device can include any of a wide range of devices, which include desktop computers, notebook computers (ie, laptop), tablet computers, set-top boxes, telephone devices such as so-called phones smart phones, so-called smart pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[0147] A video encoding system, which includes an encoding system and / or a decoding system, can be used to encode and / or decode video data. An example of a
Petition 870190092259, of 16/09/2019, p. 64/118
60/83 video encoding and decoding includes a source device that provides encoded video data to be decoded later by a destination device. Specifically, the source device provides the video data to the destination device via a computer-readable medium. The source device and the target device can include any of a wide range of devices, including desktop computers, notebook computers (ie, laptop), tablet computers, set-top boxes, telephone devices such as so-called phones smart phones, so-called smart pads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some cases, the source device and the destination device may be equipped for wireless communication.
[0148] The target device can receive the encoded video data to be decoded via the computer-readable medium. The computer-readable medium can comprise any type of medium or device capable of moving the encoded video data from the source device to the destination device. In one example, the computer-readable medium may comprise a communication medium for enabling the source device to transmit encoded video data directly to the destination device in real time. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the
Petition 870190092259, of 16/09/2019, p. 65/118
61/83 target device. The communication means may comprise any wireless or wired communication means, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may be part of a packet-based network, such as a local area network, an extended area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations or other equipment that may be useful to facilitate communication from the source device to the destination device.
[0149] In some examples, encrypted data can be transmitted from the output interface to a storage device. In the same way, encrypted data can be accessed from the storage device via the input interface. The storage device may include any of several means of storing data distributed or accessed locally, such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory or any other means of digital storage to store encoded video data. In an additional example, the storage device can correspond to a file server or other intermediate storage device that can store the encoded video generated by the source device. The target device can access stored video data from the storage device via streaming or downloading. The file server can be any type of server capable of storing encoded video data and
Petition 870190092259, of 16/09/2019, p. 66/118
62/83 transmitting that encoded video data to the destination device. Exemplary file servers include a Web server (such as for a website), an FTP server, network attached storage devices (NAS) or a local disk drive. The target device can access the encoded video data via any standard data connection, which includes an Internet connection. This may include a wireless channel (such as a Wi-Fi connection), a wired connection (such as DSL, cable modem, etc.), or a combination of both that is suitable for accessing data from encoded video stored on a file server. The transmission of encoded video data from the storage device can be a streaming transmission, a download transmission or a combination of them.
[0150] The techniques of this disclosure are not necessarily limited to wireless applications or configurations. The techniques described can be applied to video encoding in support of a variety of multimedia applications, such as broadcast television broadcasts, cable television broadcasts, satellite television broadcasts, Internet streaming video broadcasts, such as such as a dynamic adaptive streaming over HTTP (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium or other applications. In some instances, the system can be configured to support single-track or dual-track video transmission for
Petition 870190092259, of 16/09/2019, p. 67/118
63/83 support applications such as video streaming, video replay, video broadcasts and / or video telephony.
[0151] In one example, the source device includes a video source, a video encoder and an output interface. The target device may include an input interface, a video decoder and a display device. The video encoder of the source device can be configured to apply the techniques disclosed herein. In other examples, a source device and a target device can include other components or arrangements. For example, the source device can receive video data from an external video source, such as an external camera. In the same way, the target device can interface with an external display device, instead of including an integrated display device.
[0152] The exemplary system above is merely an example. Techniques for processing video data in parallel can be performed by any digital video encoding and / or decoding device. While the techniques of this disclosure are generally performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a CODEC. Furthermore, the techniques of this development can also be carried out by a video preprocessor. The source device and the target device are merely examples of such encoding devices in which the source device generates encoded video data for transmission
Petition 870190092259, of 16/09/2019, p. 68/118
64/83 for the target device. In some examples, the source and destination devices may operate in a substantially symmetrical manner, so that each of the devices includes video encoding and decoding components. Consequently, exemplary systems can support one-way or two-way video transmission between video devices,
example, for streaming video, looping in video, execution broadcasts from video, or telephony with video. [0153] The source of video can include one device video capture, such as a camera in
video, a video file containing previously captured videos, and / or a video feed interface for receiving video from a video content provider. As an additional alternative, the video source can generate computer-based data such as the video source or a combination of live videos, archived videos and computer generated videos. In some cases, if the video source is a video camera, the source device and the destination device can form so-called camera phones or video phones. As mentioned above, however, the techniques described in this disclosure may be applicable to video encoding in general, and can be applied to wireless and / or wired applications. In each case, the captured, pre-captured or computer generated video can be encoded by the video encoder. The encoded video information can be transmitted via the output interface to a computer-readable medium.
Petition 870190092259, of 16/09/2019, p. 69/118
65/83 [0154] As noted, the computer-readable medium may include transient media, such as a wireless broadcast or wired transmission network or storage medium (i.e., non-transitory storage medium), such as a hard disk , flash drive, compact disc, digital video disc, Blu-ray disc or other computer-readable medium. In some examples, a network server (not shown) can receive encoded video data from the source device and provide the encoded video data to the destination device, for example, via network transmission. Likewise, a computing device of a media production facility, such as a disc embossing facility, can receive encoded video data from the source device and produce a disc containing the encoded video data. Therefore, the computer-readable medium can be understood as including one or more computer-readable media in different ways, in several examples.
[0155] Anyone skilled in the art will understand that symbols or terminology less than (<) and greater than (>), used herein, can be replaced by symbols less than or equal to (<) and greater than or equal to (>) , respectively, without departing from the scope of this disclosure.
[0156] Details specific to an encoding device 104 and a decoding device 112 are shown in Figure 12 and Figure 13, respectively. Figure 12 is a block diagram illustrating an exemplary coding device 104 that can
Petition 870190092259, of 16/09/2019, p. 70/118
66/83 implement one or more of the techniques described in this disclosure. The encoding device 104 can, for example, generate the syntax structures described herein (such as, for example, the syntax structures of a VPS, SPS, PPS or other syntax elements). The encoding device 104 can perform intra-prediction and inter-prediction coding of video blocks within video slices. As previously described, intra-coding depends, at least in part, on spatial prediction to reduce or remove spatial redundancy within a given frame or video image. Inter-coding depends, at least in part, on time prediction to reduce or remove time redundancy within adjacent frames or around a video sequence. Intramodo (mode I) can refer to any of the various space-based compression modes. Inter-modes, such as one-way prediction (P mode) or bi-prediction (B mode), can refer to any of several time-based compression modes.
[0157] The coding device 104 includes a partitioning unit 35, the prediction processing unit, the filter unit 63, the image memory 64, the adder 50, the transform processing unit 52, the quantizing unit 54 and the entropy coding unit 56. The prediction processing unit 41 includes the motion estimation unit 42, the motion compensation unit 44, and the intra prediction processing unit 46. For reconstruction of blocks of motion. video, the encoding device 104 also includes the
Petition 870190092259, of 16/09/2019, p. 71/118
67/83 reverse quantification 58, the reverse transform processing unit 60 and the adder 62. The filter unit 63 is intended to represent one or more filter circuits, such as an unlock filter, an adaptive mesh filter ( ALF) and an adaptive sample displacement filter (SAO). Although filter unit 63 is shown in Figure 12 as a mesh filter, in other configurations, filter unit 63 can be implemented as a post-mesh filter. A post-processing device 57 can perform additional processing on encoded video data generated by the encoding device 104. The techniques of this disclosure can, in some cases, be implemented by the encoding device 104. In other cases, however, one or more more of the coding techniques of this disclosure can be implemented by the post-processing device 57.
[0158] As shown in Figure 12, the encoding device 104 receives video data and the partitioning unit 35 partitions the data into video blocks. Partitioning can also include partitioning into slices, slice segments, juxtapositions or other larger units, as well as partitioning video blocks, such as, for example, according to a quad-tree structure of LCUs and CUs. The encoding device 104 generally shows the components that encode video blocks within a video slice to be encoded. The slice can be divided into several blocks of video (and possibly sets of blocks of video called as juxtapositions). The prediction processing unit 41 can select one of a
Petition 870190092259, of 16/09/2019, p. 72/118
68/83 plurality of possible encoding modes, such as one of a plurality of intrapredictive encoding modes or one of a plurality of inter-prediction encoding modes, for the current video block based on error results (such as , for example, encoding rate and level of distortion or similar). The prediction processing unit 41 can provide the resulting intra or inter-coded block to the adder 50 to generate residual block data and to the adder 62 to reconstruct the coded block for use as a reference image.
[0159] The intraprediction processing unit 46 within the prediction processing unit 41 can perform the intra-predictive encoding of the current video block with respect to one or more neighboring blocks in the same frame or slice of the current block to be encoded for provide spatial compression. The motion estimation unit 42 and the motion compensation unit 44 within the prediction processing unit 41 perform inter-predictive encoding of the current video block with respect to one or more predictive blocks in one or more reference images to provide temporal compression.
[0160] The motion estimation unit 42 can be configured to determine the interpretation mode for a video slice according to a predetermined standard for a video sequence. The predetermined pattern can designate video slices in the sequence as P slices, B slices or GPB slices. The motion estimation unit 42 and the motion compensation unit 44 can be highly integrated, but are
Petition 870190092259, of 16/09/2019, p. 73/118
69/83 shown separately for conceptual purposes. Motion estimation, performed by motion estimation unit 42, is the process of generating motion vectors that estimate motion for video blocks. A motion vector, for example, can indicate the displacement of a prediction unit (PU) from a video block within a current video frame or image relative to a predictive block within a reference image.
[0161] A predictive block is a block that is established to correspond strictly with the PU of the video block to be encoded in terms of pixel difference, which can be determined by the sum of the absolute difference (SAD), sum of the difference to the square (SSD) or other difference metrics. In some examples, encoding device 104 can calculate values for integer sub-pixel positions of reference images stored in image memory 64. For example, encoding device 104 can interpolate quarter pixel position values , eighth pixel positions, or other fractional pixel positions in the reference image. Therefore, motion estimation unit 42 can perform a motion search with respect to complete pixel positions and fractional pixel positions and transmit a motion vector with fractional pixel precision.
[0162] Motion estimation unit 42 calculates a motion vector for a PU of a video block in an inter-encoded slice, comparing the position of the PU with the position of a predictive block of a reference image. The reference image can be selected from
Petition 870190092259, of 16/09/2019, p. 74/118
70/83 a first list of reference images (List 0) or a second list of reference images (List 1), each of which identifies one or more reference images stored in memory 64. The motion estimation unit 42 sends the calculated motion vector to entropy coding unit 56 and motion compensation unit 44.
[0163] Motion compensation, performed by motion compensation unit 44, may involve fetching or generating the predictive block based on the motion vector determined by the motion estimation, possibly making interpolations to subpixel precision. Upon receiving the motion vector for the PU of the current video block, the motion compensation unit 44 can locate the predictive block to which the motion vector points in one of the reference image lists. The encoding device 104 forms a residual video block by subtracting the pixel values from the predictive block of the pixel values of the current video block being encoded, forming pixel difference values. The pixel difference values form residual data for the block and can include both luma and chroma difference components. The adder 50 represents the component or components that perform this subtraction operation. The motion compensation unit 44 can also generate elements of syntax associated with the video blocks and the video slice for use by the decoding device 112 in decoding the video blocks of the video slice.
[0164] The intra processing unit
Petition 870190092259, of 16/09/2019, p. 75/118
71/83 prediction 46 can intra-predict the current block, as an alternative to inter-prediction performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intra-prediction processing unit 46 can determine the intra-prediction mode to be used to encode the current block. In some examples, the intra-prediction processing unit 46 may encode the tree block using various intra-prediction modes, such as during separate coding passages and the intra-prediction processing unit 46, (or the mode selection unit 40) in some examples, you can select an appropriate intra-prediction mode to be used from the tested modes. For example, the intra-prediction processing unit 46 can calculate rate distortion values using a rate distortion analysis for the various intra prediction modes tested and select the intra prediction mode that has the best distortion characteristics of rate between tested modes. The rate distortion analysis generally determines the degree of distortion (or error) between a coded block and an original non-coded block, which has been coded to produce the coded block, as well as the bit rate (that is, the number of bits) used to produce the coded block. The intraprediction processing unit 46 can calculate ratios from the distortions and rates for the various coded blocks in order to determine which intraprediction mode has the best rate distortion values for the block.
Petition 870190092259, of 16/09/2019, p. 76/118
72/83 [0165] In any case, after selecting an intra-prediction mode for a block, the intra-prediction processing unit 46 can provide information indicating the intra-prediction mode selected for the block to the coding unit by entropy 56. The entropy coding unit 56 can encode information that indicates the selected intraprediction mode. The encoding device 104 may include in the transmitted bit stream settings data settings for encoding contexts for various blocks as well as indications of the most likely intra-prediction mode, an intra-prediction mode index table, and an index table. modified intra-prediction mode indexes for use in each context. The bitstream configuration data may include a plurality of intra-prediction mode index tables and a plurality of modified intra-prediction mode index tables (also referred to as codeword mapping tables).
[0166] After the prediction processing unit 41 generates the predictive block for the current video block through inter-prediction or intra-prediction, the encoding device 104 forms a residual video block by subtracting the predictive block from the block of current video. Residual video data in the residual block can be included in one or more UTs and applied to the transform processing unit 52. The transform processing unit 52 transforms the residual video data into residual transform coefficients using a transform, such as a transform of
Petition 870190092259, of 16/09/2019, p. 77/118
73/83 discrete cosine (DCT) or a conceptually similar transform. The transform processing unit 52 can convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
[0167] The transform processing unit 52 can send the resulting transform coefficients to the quantization unit 54. The quantization unit 54 quantifies the transform coefficients for further reduction of the bit rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantification can be modified by adjusting a quantization parameter. In some examples, the quantization unit 54 can then scan the matrix that includes the quantized transform coefficients. Alternatively, the entropy coding unit 56 can scan.
[0168] After quantification, entropy coding unit 56 entropy codes the quantized transform coefficients. For example, the entropy coding unit 56 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic (CABAC), syntax-based context-adaptive binary coding (SBAC), entropy coding with probability interval partitioning (PIPE) or other entropy coding technique. Following entropy coding by entropy coding unit 56, the encoded bit stream
Petition 870190092259, of 16/09/2019, p. 78/118
74/83 can be transmitted to the decoding device 112 or archived for later transmission or retrieval by the decoding device 112. The entropy coding unit 56 can also entropy the motion vectors and other syntax elements for the slice. of current video being encoded.
[0169] The inverse quantization unit 58 and the inverse transform processing unit 60 apply reverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference image. The motion compensation unit 44 can calculate a reference block by adding the residual block to a predictive block of one of the reference images within one of the reference image lists. The motion compensation unit 44 can also apply one or more interpellation filters to the reconstructed residual block in order to recalculate integer sub-pixel values for use in motion estimation. Adder 62 adds the reconstructed residual block to the motion compensated prediction block produced by the motion compensation unit 44 in order to produce a reference block for storage in the image memory 64. The reference block can be used by the estimation unit motion 42 and motion compensation unit 44 as a reference block for interpreting a block in a subsequent frame or video image.
Petition 870190092259, of 16/09/2019, p. 79/118
75/83 [0170] In this manner, the encoding device 104 of Figure 12 represents an exemplary video encoder configured to generate syntax for an encoded video bit stream. The encoding device 104 can, for example, generate sets of VPS, SPS and PPS parameters as described above. The coding device 104 can perform any of the techniques described herein, which include the processes described above. The techniques of this disclosure have generally been described with respect to the coding device 104, but as mentioned above, some of the techniques of this disclosure can also be implemented by the post-processing device 57.
[0171] Figure 13 is a block diagram illustrating an exemplary decoding device 112. The decoding device 112 includes an entropy decoding unit 80, the prediction processing unit 81, the reverse quantization unit 86, the reverse transformation processing unit 88, adder 90, filter unit 91 and image memory 92. Prediction processing unit 81 includes motion compensation unit 82 and intra-prediction processing unit 84. The decoding device 112 may, in some examples, effect a generally reciprocal decoding pass for the encoding pass described with respect to the encoding device 104 of Figure 12.
[0172] During the decoding process, the decoding device 112 receives an encoded video bit stream that represents video blocks from one
Petition 870190092259, of 16/09/2019, p. 80/118
76/83 encoded video slice and associated syntax elements sent by the encoding device 104. In some embodiments, the decoding device 112 may receive the encoded video bit stream from the encoding device 104. In some embodiments, the decoding device 112 can receive the bit stream from a network entity 79, such as a server, a media aware network element (MANE), a video editor / mender or other device that is configured to implement a or more of the techniques described above. Network entity 79 may or may not include encoding device 104. Some of the techniques described in this disclosure may be implemented by network entity 79 before network entity 79 transmits the encoded video bit stream to decoding device 112 In some video decoding systems, the network entity 79 and the decoding device 112 may be parts of separate devices, while in other cases, the functionality described in relation to the network entity 79 can be performed by the same device that comprises the decoding device 112.
[0173] The entropy decoding unit 80 of the entropy decoding device 112 decodes the bit stream to generate quantized coefficients, motion vectors and other syntax elements. The entropy decoding unit80 directs the motion vectors and other syntax elements to the prediction processing unit 81. The decoding device 112 can receive the elements
Petition 870190092259, of 16/09/2019, p. 81/118
77/83 syntax at the video slice level and / or the video block level. The entropy decoding unit 80 can process and parse both fixed-length syntax elements and variable-length syntax elements in one or more sets of parameters, such as a VPS, SPS and PPS.
[0174] When the video slice is encoded as an intra-encoded slice (I), the intra-prediction processing unit 84 of the prediction processing unit 81 can generate prediction data for a video block of the video slice based on the signaled intra-prediction mode and previously assigned block data from the current package or image. When the video frame is encoded as an intercoded slice (i.e., B, P or GPB), the motion compensation unit 82 of the prediction processing unit 81 produces predictive blocks for a video block of the current video slice with based on motion vectors and other syntax elements received from entropy decoding unit 80. Predictive blocks can be produced from one of the reference images within one of the reference image lists. The decoding device 112 can build the lists of reference frames, List 0 and List 1 using predefined construction techniques based on the reference images stored in the image memory 92.
[0175] The motion compensation unit 82 determines the prediction information for a video block of the current video slice, parsing the motion vectors and other syntax elements and using the
Petition 870190092259, of 16/09/2019, p. 82/118
78/83 prediction information to produce the predictive blocks for the current video block that is decoded. For example, the motion compensation unit 82 can use one or more elements of syntax in a set of parameters to determine a prediction mode (such as, for example, intra- or inter-prediction) used to encode the video blocks of the video slice, a type of inter-prediction slice (such as slice B, slice P or GPB slice), construction information for one or more lists of reference images for the slice, motion vectors for each inter-encoded video block of the slice, the inter-prediction condition for each inter-encoded video block of the slice, and other information to decode the video blocks in the current video slice.
[0176] The movement compensation unit 82 and can also perform interpellation based on interpellation filters. The motion compensation unit 82 can use interpellation filters as used by the encoding device 104 during the encoding of the video blocks to calculate interpellated values for integer sub-pixels of reference blocks. In this case, the motion compensation unit 82 can determine the interpellation filters used by the coding device 104 from the received syntax elements and can use the interpellation filters to produce predictive blocks.
[0177] The inverse quantization unit 86 quantifies by inversion, or decantifies, the quantized transform coefficients provided in the bit stream and decoded by the entropy decoding unit
Petition 870190092259, of 16/09/2019, p. 83/118
79/83
80. The reverse quantization process may include the use of a quantization parameter calculated by the video encoder 20 for each video block in the video slice in order to determine the degree of quantification and, likewise, the degree of quantification reverse that must be applied. The reverse transform processing unit 88 applies an inverse transform (such as a reverse DCT or other suitable reverse transform), a reverse integer transform or a conceptually similar reverse transform process to the transform coefficients so as to produce residual blocks in the pixel domain.
[0178] After the motion compensation unit 82 generates the predictive block for the current video block based on the motion vectors and other syntax elements, the decoding device 112 forms a decoded video block adding the residual blocks of the reverse transformation processing unit 88 with the corresponding predictive blocks generated by the motion compensation unit 82. The adder 90 represents the component or components that perform this sum operation. If desired, mesh filters (either in the coding mesh or after the coding mesh) can also be used to smooth the transitions between pixels or otherwise improve the video quality. Filter unit 91 is intended to represent one or more mesh filters such as an unblocking filter, an adaptive mesh filter (ALF) and an adaptive sample displacement filter (SAO). Although filter unit 91 is shown in Figure 13 as a
Petition 870190092259, of 16/09/2019, p. 84/118
80/83 mesh filter, in other configurations, filter unit 91 can be implemented as a post-mesh filter. The video blocks decoded in a given frame or image are then stored in image memory 92, which stores reference images used for subsequent motion compensation. Image memory 92 also stores decoded video for later presentation on a display device.
[0179] In the preceding description, aspects of the application are described with reference to specific modalities of it, but those skilled in the art will recognize that the invention is not limited to them. Thus, although the illustrative modalities of the application have been described in detail here, it should be understood that innovative concepts may be otherwise varied and incorporated, and that the appended claims are intended to be interpreted so as to include such variations, except when limited by the prior art. Several features and aspects of the invention described above can be used individually or together. In addition, the modalities can be used in any series of environments and applications other than those described here, without departing from the spirit and wider scope of the specification. The specification and drawings are therefore considered to be illustrative rather than restrictive. The methods have been described in a specific order for illustrative purposes. It should be understood that, in alternative modalities, the methods can be carried out in a different order from that described.
Petition 870190092259, of 16/09/2019, p. 85/118
81/83 [0180] Where components are described as being configured to perform certain operations, such configuration can be performed, for example, by designing electronic circuits or other hardware to perform the operation, by programming programmable electronic circuits (such as, for example, microprocessors or other electronic circuits) to perform the operation or any combination of them.
[0181] The various illustrative logic blocks, modules, circuits and algorithm steps described in connection with the modalities disclosed here can be implemented as electronic hardware, computer software, firmware or in combinations thereof. To clearly illustrate this interchangeability of hardware and software, several components, modules, circuits and illustrative steps have been described above, generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends on the specific application and design restrictions imposed on the system as a whole. Those skilled in the art can implement the functionality described in varying ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
[0182] The techniques described here can also be implemented in electronic hardware, computer software, firmware or any combination of them. Such techniques can be implemented on any of a variety of devices, such as general purpose computers, sets of communication devices
Petition 870190092259, of 16/09/2019, p. 86/118
82/83 wireless or integrated circuit devices that have multiple uses, which include applications in sets of wireless communication devices and other devices. Any features described as modules or components can be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques can be performed, at least in part, by a computer-readable data storage medium that comprises program code that includes instructions that, when executed, perform one or more of the methods described above. The computer-readable data storage medium may be part of a computer program product, which may include packaging materials. The computer-readable medium may comprise data storage medium or memory, such as random access memory (RAM), such as synchronous dynamic random access memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM) electrically erasable programmable read-only memory (EEPROM), FLASH memory, magnetic or optical data storage media and the like. The techniques, additionally or alternatively, can be carried out, at least in part, by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and
what can to be accessed, read and / or executed per one computer such[0183] as0 signalscode propagatedfrom the program or waves.Can be executed per one processor, what can include a or more
Petition 870190092259, of 16/09/2019, p. 87/118
83/83 processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application-specific integrated circuits (ASICs), field programmable logic arrays (FPGAs) or other discrete or integrated logic circuits. Such a processor can be configured to perform any of the techniques described in this disclosure. A general purpose processor can be a microprocessor; but, in the alternative, the processor can be any conventional processor, controller, microcontroller or state machine. A processor can also be implemented as a combination of computing devices, such as, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration . Accordingly, the term processor, as used herein, can refer to any of the above-mentioned structures, any combination of the above-mentioned structures, or any other structure or apparatus suitable for implementing the techniques described herein. In addition, in some respects, the functionality described here can be provided within dedicated software modules or hardware modules configured for encoding and decoding, or incorporated into a combined video encoder / decoder (CODEC).

权利要求:
Claims (29)
[1]
1. Method for processing video data, which comprises:
obtaining 360 degree video data, which includes a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame;
segmenting a video frame from the plurality of video frames in a top region, a middle region and a bottom region, the top region including a first circular area of the spherical representation, the bottom region including a second area circular of the spherical representation that is opposite to the spherical representation of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region;
mapping the top region to a first rectangular area of an output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area; and mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[2]
2. Method according to claim 1, in which the video frame is segmented at a first latitude above an equator of the spherical representation and a second
Petition 870190092259, of 16/09/2019, p. 89/118
2/12 latitude below the equator, where the first latitude and the second latitude are equidistant from the equator, where the top region is above the first latitude and where the bottom region is below the second latitude.
[3]
3. Method according to claim 1, in which mapping the top region and mapping the bottom region includes:
select a pixel location in the output video frame;
determining a point on the spherical representation that corresponds to the pixel location, where the point on the spherical representation is determined using a mapping for converting a square to a circle;
sampling a pixel from the point on the spherical representation; and placing the sampled pixel at the pixel location.
[4]
Method according to claim 3, in which mapping to convert a square into a circle minimizes distortion in the output video frame.
[5]
5. Method according to claim 3, in which mapping the top region and mapping the bottom region also includes:
adjust the pixel location using a gradual curve function.
[6]
A method according to claim 5, wherein the gradual curve function is used to locate the pixel in an area adjacent to the additional rectangular areas in the output video frame.
[7]
7. Method according to claim 5, in
Petition 870190092259, of 16/09/2019, p. 90/118
3/12 that the gradual curve function changes pixel locations less towards the middle area of the first rectangular area or second rectangular area, and more towards an area outside the first rectangular area or second rectangular area.
[8]
8. Method according to claim 1, which also comprises:
map the middle region to one or more rectangular areas of an output video frame.
[9]
9. The method of claim 8, wherein the middle region includes a left view, a front view and a right view, wherein the left view is located on the output video frame adjacent to the front view and where the right view is located adjacent to the front view.
[10]
10. The method of claim 1, wherein the middle region includes a rear view, where the bottom region is located on the output video frame adjacent to the rear view, and where the top region it is located adjacent to the rear view.
[11]
11. Method according to claim 1, in which mapping the top region to the first rectangular area includes applying a gradual adjustment to an area where the first rectangular area is adjacent to a third rectangular area in the output video frame and in Mapping the bottom region in the second rectangular area includes applying gradual adjustment to an area where the second rectangular area is adjacent to a fourth rectangular area in the output video frame.
[12]
12. Method according to claim 1, in
Petition 870190092259, of 16/09/2019, p. 91/118
4/12 that the output video frame has a three-by-two aspect ratio.
[13]
13. Video encoding device comprising:
a memory configured to store 360 degree video data that includes a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame; and a processor configured to:
segment a video frame from the plurality of video frames in a top region, a middle region and a bottom region, the top region including a first circular area of the spherical representation, the bottom region including a second area circular of the spherical representation that is opposite to the spherical representation of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region;
mapping the top region to a first rectangular area of an output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area; and mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[14]
14. Computer readable non-transitory medium,
Petition 870190092259, of 16/09/2019, p. 92/118
5/12 that has stored in it instructions that, when executed by one or more processors, cause the one or more processors to perform operations, which include:
obtaining 360 degree video data, which includes a plurality of video frames, each video frame of the plurality of video frames including a spherical representation of video data for the video frame;
segment a video frame from the plurality of video frames into a top region, a middle region and a bottom region, the top region including a first circular area of the spherical representation, the bottom region including a second circular area of the spherical representation that is opposite to the spherical representation of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region;
mapping the top region to a first rectangular area of the output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area; and mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[15]
15. Apparatus, comprising:
means to obtain 360-degree video data, which includes a plurality of video frames, each frame of
Petition 870190092259, of 16/09/2019, p. 93/118
6/12 video of the plurality of video frames including a spherical representation of video data for the video frame;
means for segmenting a video frame from the video frames of the plurality of video frames in a top region, a middle region and a bottom region, the top region including a first circular area of the spherical representation, which includes a second circular area of the spherical representation that is opposite to the spherical representation of the first circular area, in which the middle region includes an area of the spherical representation not included in the top or bottom region;
means for mapping the top region to a first rectangular area of the output video frame, where mapping the top region includes expanding the video data included in the first circular area to fill the first rectangular area; and mapping the bottom region to a second rectangular area of the output video frame, where mapping the bottom region includes expanding the video data included in the second circular area to fill the second rectangular area.
[16]
16. Method for processing video data, which comprises:
obtaining 360 degree video data, which includes a plurality of video frames, each video frame from the plurality of video frames including a two-dimensional representation of video data for the video frame;
identify a first rectangular area of a
Petition 870190092259, of 16/09/2019, p. 94/118
7/12 video frame from the plurality of video frames;
map the first rectangular area in a top region of a spherical representation of video data to the video frame, where the top region comprises a first circular area of the spherical representation and where mapping the first rectangular area includes arranging the data video of the first rectangular area in the first circular area;
identify a second rectangular area of the video frame; and map the second rectangular area in a top region of the spherical representation, where the bottom region comprises a second circular area of the spherical representation and where mapping the second rectangular area includes arranging the video data from the second rectangular area in the second area Circular.
identify a second rectangular area of the video frame; and map the second rectangular area in a region below the spherical representation, where the bottom region comprises a second circular area of the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area in the second area Circular.
[17]
17. The method of claim 16, wherein the top region includes a surface of the spherical representation above a first latitude of the spherical representation, wherein the bottom region includes a surface of the spherical representation below a second latitude of the
Petition 870190092259, of 16/09/2019, p. 95/118
8/12 spherical representation, where the first latitude and the second latitude are equidistant from an equator of the spherical representation.
[18]
18. The method of claim 16, wherein mapping the first rectangular area and mapping the second rectangular area includes:
select a point on the spherical representation;
determining a pixel location in the video frame that corresponds to the point, where the pixel location is determined using a mapping to convert a circle to a square;
sampling a pixel from the pixel location; and locate the sampled pixel at the point.
[19]
19. The method of claim 18, wherein mapping to convert a circle to a square reverses the distortion caused when the video data in the first rectangular area or the second rectangular area has been expanded to fill the first rectangular area or the second rectangular area.
[20]
20. The method of claim 18, wherein mapping the first rectangular area and mapping the second rectangular area further includes:
adjust the pixel location using a gradual curve function.
[21]
21. The method of claim 20, wherein the gradual curve function is used at pixel locations in an area adjacent to at least one of one or more additional rectangular areas.
Petition 870190092259, of 16/09/2019, p. 96/118
9/12
[22]
22. The method of claim 20, wherein the gradual curve function changes pixel locations less toward an area in the middle of the first rectangular area or second rectangular area and more towards an area outside the first rectangular area or second rectangular area.
[23]
23. The method of claim 16, which also comprises:
map one or more additional rectangular areas of the video frame to a region in the middle of the spherical representation.
[24]
24. The method of claim 23, wherein the one or more additional rectangular areas include a left view, a front view and a right view, where the left view is located adjacent to the front view and where the right view is adjacent to the front view.
[25]
25. The method of claim 16, wherein one or more additional rectangular areas of the video frame include a rear view, in which the first rectangular area is adjacent to the rear view and in which the second rectangular area is adjacent in the back view.
[26]
26. The method of claim 16, in which mapping the first rectangular area in the top region includes applying a gradual adjustment to an area where the first rectangular area is adjacent to a third rectangular area in the video frame and in which to map the second rectangular area in the lower region includes applying a gradual adjustment to an area where the second rectangular area is adjacent to a fourth rectangular area in the
Petition 870190092259, of 16/09/2019, p. 97/118
10/12 video frame.
[27]
27. Video encoding device comprising:
a memory configured to store 360-degree video data, which includes a plurality of video frames, each video frame, from the plurality of video frames, including a two-dimensional representation of video data for the video frame; and a processor configured to:
identifying a first rectangular area of a video frame from the plurality of video frames;
map the first rectangular area in a top region of a spherical representation of video data to the video frame, where the top region comprises a first circular area of the spherical representation and where mapping the first rectangular area includes arranging the data video of the first rectangular area in the first circular area;
identify a second rectangular area of the video frame; and map the second rectangular area in a region below the spherical representation, where the bottom region comprises a second circular area of the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area in the second area Circular.
[28]
28. Computer readable non-transitory medium, which has instructions stored on it that, when executed by one or more processors, cause the one or more
Petition 870190092259, of 16/09/2019, p. 98/118
12/11 processors perform operations, which include:
obtaining 360 degree video data, which includes a plurality of video frames, each video frame, from the plurality of video frames, including a two-dimensional representation of video data for the video frame;
identifying a first rectangular area of a video frame from the plurality of video frames;
map the first rectangular area in a top region of a spherical representation of video data to the video frame, where the top region comprises a first circular area of the spherical representation and where mapping the first rectangular area includes arranging the data video of the first rectangular area in the first circular area;
identify a second rectangular area of the video frame; and map the second rectangular area in a region below the spherical representation, where the bottom region comprises a second circular area of the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area in the second area Circular.
[29]
29. Apparatus, comprising:
means for obtaining 360 degree video data, which includes a plurality of video frames, each video frame, from the plurality of video frames, including a two-dimensional representation of video data for the video frame;
Petition 870190092259, of 16/09/2019, p. 99/118
12/12 means for identifying a first rectangular area of a video frame from the plurality of video frames;
means for mapping the first rectangular area in a top region of a spherical representation of video data onto the video frame, where the top region comprises a first circular area of the spherical representation and where mapping the first rectangular area includes having the video data from the first rectangular area in the first circular area;
means for identifying a second rectangular area of the video frame; and means for mapping the second rectangular area on a bottom region of the spherical representation, where the bottom region comprises a second circular area of the spherical representation and where mapping the second rectangular area includes arranging the video data of the second rectangular area on second circular area.

类似技术:

公开号 | 公开日 | 专利标题

BR112019019191A2|2020-04-22|polar sphere projections for efficient 360-degree video compression

US10839480B2|2020-11-17|Sphere equator projection for efficient compression of 360-degree video

US10620441B2|2020-04-14|Viewport-aware quality metric for 360-degree video

US10319071B2|2019-06-11|Truncated square pyramid geometry and frame packing structure for representing virtual reality video content

US10915986B2|2021-02-09|Adaptive perturbed cube map projection

US10313664B2|2019-06-04|Adjusting field of view of truncated square pyramid projection for 360-degree video

US10699389B2|2020-06-30|Fisheye rendering with lens distortion correction for 360-degree video

TW201911864A|2019-03-16|Reduce seam artifacts in 360-degree video

BR112019010875A2|2019-10-01|signaling systems and methods of regions of interest

TW201911863A|2019-03-16|Reference map derivation and motion compensation for 360-degree video writing code

US20190273929A1|2019-09-05|De-Blocking Filtering Method and Terminal

BR112019019339A2|2020-04-14|video content signaling including subimage bit streams for video encoding

KR102373921B1|2022-03-11|Fisheye rendering with lens distortion correction for 360 degree video

同族专利:

公开号 | 公开日

SG11201907262QA|2019-10-30|

US20180276826A1|2018-09-27|

AU2018239451A1|2019-08-29|

US10957044B2|2021-03-23|

EP3603072A1|2020-02-05|

US20210166394A1|2021-06-03|

TW201840181A|2018-11-01|

CN110463205A|2019-11-15|

KR20190128689A|2019-11-18|

WO2018175614A1|2018-09-27|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

KR20180082312A|2017-01-10|2018-07-18|삼성전자주식회사|Method and apparatus for transmitting stereoscopic video content|US10999602B2|2016-12-23|2021-05-04|Apple Inc.|Sphere projected motion estimation/compensation and mode decision|

US11259046B2|2017-02-15|2022-02-22|Apple Inc.|Processing of equirectangular object data to compensate for distortion by spherical projections|

US10924747B2|2017-02-27|2021-02-16|Apple Inc.|Video coding techniques for multi-view video|

US10839480B2|2017-03-22|2020-11-17|Qualcomm Incorporated|Sphere equator projection for efficient compression of 360-degree video|

US10587800B2|2017-04-10|2020-03-10|Intel Corporation|Technology to encode 360 degree video content|

US11182639B2|2017-04-16|2021-11-23|Facebook, Inc.|Systems and methods for provisioning content|

US11093752B2|2017-06-02|2021-08-17|Apple Inc.|Object tracking in multi-view video|

US20190005709A1|2017-06-30|2019-01-03|Apple Inc.|Techniques for Correction of Visual Artifacts in Multi-View Images|

US10754242B2|2017-06-30|2020-08-25|Apple Inc.|Adaptive resolution and projection format in multi-direction video|

EP3425483A3|2017-07-07|2019-04-10|Accenture Global Solutions Limited|Intelligent object recognizer|

US11212438B2|2018-02-14|2021-12-28|Qualcomm Incorporated|Loop filter padding for 360-degree video coding|

US10715832B2|2018-03-16|2020-07-14|Mediatek Inc.|Method and apparatus of block partition for VR360 video coding|

US20200213570A1|2019-01-02|2020-07-02|Mediatek Inc.|Method for processing projection-based frame that includes at least one projection face and at least one padding region packed in 360-degree virtual reality projection layout|

US10614553B1|2019-05-17|2020-04-07|National Chiao Tung University|Method for spherical camera image stitching|

CN111666434B|2020-05-26|2021-11-02|武汉大学|Streetscape picture retrieval method based on depth global features|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762474767P| true| 2017-03-22|2017-03-22|

US201762528264P| true| 2017-07-03|2017-07-03|

US15/926,957|US10957044B2|2017-03-22|2018-03-20|Sphere pole projections for efficient compression of 360-degree video|

PCT/US2018/023604|WO2018175614A1|2017-03-22|2018-03-21|Sphere pole projections for efficient compression of 360-degree video|

[返回顶部]